Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

raft: fix correctness bug in CommittedEntries pagination #10063

Merged
merged 1 commit into from
Oct 2, 2018

Conversation

tbg
Copy link
Contributor

@tbg tbg commented Sep 4, 2018

In #9982, a mechanism to limit the size of CommittedEntries was
introduced. The way this mechanism works was that it would load
applicable entries (passing the max size hint) and would emit a
HardState whose commit index was truncated to match the limitation
applied to the entries. Unfortunately, this was subtly incorrect
when the user-provided Entries implementation didn't exactly
match what Raft uses internally. Depending on whether a Node or
a RawNode was used, this would either lead to regressing the
HardState's commit index or outright forgetting to apply entries,
respectively.

Asking implementers to precisely match the Raft size limitation
semantics was considered but looks like a bad idea as it puts
correctness squarely in the hands of downstream users. Instead, this
PR removes the truncation of HardState when limiting is active
and tracks the applied index separately. This removes the old
paradigm (that the previous code tried to work around) that the
client will always apply all the way to the commit index, which
isn't true when commit entries are paginated.

See 1 for more on the discovery of this bug (CockroachDB's
implementation of Entries returns one more entry than Raft's when the
size limit hits).

In etcd-io#9982, a mechanism to limit the size of `CommittedEntries` was
introduced. The way this mechanism worked was that it would load
applicable entries (passing the max size hint) and would emit a
`HardState` whose commit index was truncated to match the limitation
applied to the entries. Unfortunately, this was subtly incorrect
when the user-provided `Entries` implementation didn't exactly
match what Raft uses internally. Depending on whether a `Node` or
a `RawNode` was used, this would either lead to regressing the
HardState's commit index or outright forgetting to apply entries,
respectively.

Asking implementers to precisely match the Raft size limitation
semantics was considered but looks like a bad idea as it puts
correctness squarely in the hands of downstream users. Instead, this
PR removes the truncation of `HardState` when limiting is active
and tracks the applied index separately. This removes the old
paradigm (that the previous code tried to work around) that the
client will always apply all the way to the commit index, which
isn't true when commit entries are paginated.

See [1] for more on the discovery of this bug (CockroachDB's
implementation of `Entries` returns one more entry than Raft's when the
size limit hits).

[1]: cockroachdb/cockroach#28918 (comment)
@tbg tbg force-pushed the fix-commit-pagination branch from b844bdd to 7a8ab37 Compare September 4, 2018 12:52
@gyuho gyuho requested review from bdarnell and xiang90 September 4, 2018 13:49
@xiang90
Copy link
Contributor

xiang90 commented Sep 5, 2018

Unfortunately, this was subtly incorrect
when the user-provided Entries implementation didn't exactly
match what Raft uses internally.

Can you explain this more?

@@ -381,13 +395,17 @@ func (n *node) run(r *raft) {
if !IsEmptySnap(rd.Snapshot) {
prevSnapi = rd.Snapshot.Metadata.Index
}
if index := rd.appliedCursor(); index != 0 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rd.CanApplyTo? The index returned here is the index the application can apply to rather than what the application has applied.

@xiang90
Copy link
Contributor

xiang90 commented Sep 5, 2018

@tschottdorf

This might have a side effect for the raft application. Previously, the commit index will not be greater than the max index it gets from entries. Now the variants changes. Probably we need to document it around the entries pagination limit.

@tbg
Copy link
Contributor Author

tbg commented Sep 5, 2018

Unfortunately, this was subtly incorrect
when the user-provided Entries implementation didn't exactly
match what Raft uses internally.

slice internally returns at least one entry, but clips entries so that the total size is strictly below maxBytes (but only if Entries doesn't already return less than it could). If the user-provided Entries impl doesn't exactly implement size < maxBytes then the invariant that ApplyTo == Commit gets violated:

  1. nextEnts gets called for a index range 1..10000
  2. Entries returns 1..100 (but 100 is already more than maxBytes) (slice returns early)
  3. app persists new HardState for Commit 100
  4. crash
  5. nextEnts gets called for index range 1..100
  6. Entries returns 1..100 (as before)
  7. this time Raft additionally applies its own limit, so it removes the last entry (see slice, now returns late)
  8. app gets asked to apply 1..99 but has 100 committed
  9. also various bugs in the Raft code as fixed here

There are ways to fix this and keep the invariant, but I think it's the wrong approach because it makes it much more important that the user implements exactly the size behavior that Raft wants (for example, in CockroachDB, there is a cache for the Raft entries and so there are various code paths for which it is burdensome to prove that they're all exactly the same - it's much nicer to have the size limitation as a "hint" that you can fulfill approximately).

The applied index should be allowed to lag behind the commit index - I think that's how we would've done it had commit pagination been introduced earlier. It's possible that someone is relying on this behavior, but I'm not aware that this behavior was ever documented or "intentional". CockroachDB is completely oblivious to this change, for example. It's also better for fault tolerance to bump the commit index aggressively (even if there is a lot of log that needs to be applied still).

That said, it's a change worth vetting closely. I'm definitely fixing two bugs here, but I don't want to introduce a new one.

tbg added a commit to tbg/cockroach that referenced this pull request Sep 5, 2018
This works around the bug outlined in:

etcd-io/etcd#10063

by matching Raft's internal implementation of commit pagination.
Once the above PR lands, we can revert this commit (but I assume
that it will take a little bit), and I think we should do that
because the code hasn't gotten any nicer to look at.

Fixes cockroachdb#28918.

Release note: None
tbg added a commit to tbg/cockroach that referenced this pull request Sep 6, 2018
This works around the bug outlined in:

etcd-io/etcd#10063

by matching Raft's internal implementation of commit pagination.
Once the above PR lands, we can revert this commit (but I assume
that it will take a little bit), and I think we should do that
because the code hasn't gotten any nicer to look at.

Fixes cockroachdb#28918.

Release note: None
craig bot pushed a commit to cockroachdb/cockroach that referenced this pull request Sep 6, 2018
29579: storage: return one entry less in Entries r=petermattis a=tschottdorf

This works around the bug outlined in:

etcd-io/etcd#10063

by matching Raft's internal implementation of commit pagination.
Once the above PR lands, we can revert this commit (but I assume
that it will take a little bit), and I think we should do that
because the code hasn't gotten any nicer to look at.

Fixes #28918.

Release note: None

29631: cli: handle merged range descriptors in debug keys r=petermattis a=tschottdorf

Noticed during #29252.

Release note: None

Co-authored-by: Tobias Schottdorf <tobias.schottdorf@gmail.com>
@wenjiaswe
Copy link
Contributor

@wenjiaswe

tbg added a commit to tbg/cockroach that referenced this pull request Sep 6, 2018
This works around the bug outlined in:

etcd-io/etcd#10063

by matching Raft's internal implementation of commit pagination.
Once the above PR lands, we can revert this commit (but I assume
that it will take a little bit), and I think we should do that
because the code hasn't gotten any nicer to look at.

Fixes cockroachdb#28918.

Release note: None
@tbg
Copy link
Contributor Author

tbg commented Sep 20, 2018

@xiang90, do you have more comments?

@bdarnell bdarnell merged commit 08e88c6 into etcd-io:master Oct 2, 2018
tbg added a commit to tbg/cockroach that referenced this pull request Oct 15, 2018
This works around the bug outlined in:

etcd-io/etcd#10063

by matching Raft's internal implementation of commit pagination.
Once the above PR lands, we can revert this commit (but I assume
that it will take a little bit), and I think we should do that
because the code hasn't gotten any nicer to look at.

Fixes cockroachdb#28918.

Release note: None

#
# Commit message recommendations:
#
#     ---
#     <pkg>: <short description>
#
#     <long description>
#
#     Release note (category): <release note description>
#     ---
#
# Wrap long lines! 72 columns is best.
#
# The release note must be present if your commit has
# user-facing changes. Leave the default above if not.
#
# Categories for release notes:
# - cli change
# - sql change
# - admin ui change
# - general change (e.g., change of required Go version)
# - build change (e.g., compatibility with older CPUs)
# - enterprise change (e.g., change to backup/restore)
# - backwards-incompatible change
# - performance improvement
# - bug fix
#
# Commit message recommendations:
#
#     ---
#     <pkg>: <short description>
#
#     <long description>
#
#     Release note (category): <release note description>
#     ---
#
# Wrap long lines! 72 columns is best.
#
# The release note must be present if your commit has
# user-facing changes. Leave the default above if not.
#
# Categories for release notes:
# - cli change
# - sql change
# - admin ui change
# - general change (e.g., change of required Go version)
# - build change (e.g., compatibility with older CPUs)
# - enterprise change (e.g., change to backup/restore)
# - backwards-incompatible change
# - performance improvement
# - bug fix
nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this pull request Oct 15, 2018
Picks up etcd-io/etcd#10167. Future commits will use the new setting
to replace broken logic that prevented unbounded Raft log growth.

This also picks up etcd-io/etcd#10063.

Release note: None
@tbg tbg deleted the fix-commit-pagination branch October 17, 2018 09:36
nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this pull request Oct 17, 2018
Picks up etcd-io/etcd#10167. Future commits will use the new setting
to replace broken logic that prevented unbounded Raft log growth.

This also picks up etcd-io/etcd#10063.

Release note: None
nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this pull request Nov 26, 2018
Picks up etcd-io/etcd#10167. Future commits will use the new setting
to replace broken logic that prevented unbounded Raft log growth.

This also picks up etcd-io/etcd#10063.

Release note: None
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

4 participants